Chinese Idiom Paraphrasing

نویسندگان

چکیده

Abstract Idioms are a kind of idiomatic expression in Chinese, most which consist four Chinese characters. Due to the properties non-compositionality and metaphorical meaning, idioms hard be understood by children non-native speakers. This study proposes novel task, denoted as Idiom Paraphrasing (CIP). CIP aims rephrase idiom-containing sentences non-idiomatic ones under premise preserving original sentence’s meaning. Since without more easily handled NLP systems, can used pre-process datasets, thereby facilitating improving performance tasks, e.g., machine translation idiom cloze, embeddings. In this study, we treat task special paraphrase generation task. To circumvent difficulties acquiring annotations, first establish large-scale dataset based on human collaboration, consists 115,529 sentence pairs. addition three sequence-to-sequence methods baselines, further propose infill-based approach text infilling. The results show that proposed method has better than baselines established dataset.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paraphrasing of Chinese Utterances

One of the key issues in spoken language translation is how to deal with unrestricted expressions in spontaneous utterances. This research is centered on the development of a Chinese paraphraser that automatically paraphrases utterances prior to transfer in Chinese-Japanese spoken language translation. In this paper, a pattern-based approach to paraphrasing is proposed for which only morphologi...

متن کامل

Approach to Spoken Chinese Paraphrasing Based on Feature Extraction

This paper presents an approach to spoken Chinese language paraphrasing based on feature extraction and techniques of language generation. In this approach, an input utterance is first analyzed in terms of phrase structure, dependency of chunks, etc., by using multiple methods. Then, the main features of the input utterance are extracted, and the extraction results are represented by a frame. F...

متن کامل

Improving Chinese Sentence Polarity Classification via Opinion Paraphrasing

While substantial studies have been achieved on sentiment polarity classification to date, lacking enough opinion-annotated corpora for reliable t rain ing is still a challenge. In this paper we propose to improve a supported vector machines based polarity classifier by enriching both training data and test data via opinion paraphrasing. In particular, we first extract an equivalent set of attr...

متن کامل

Templatized Primitive Method Idiom

Template Method Pattern (see [5]) solves the problem of the existence of a generic algorithm for a family of classes that needs specialization in each and every concrete class. It does so by implementing the algorithm in the base class and by forwarding implementation details to (pure) virtual functions. In terms of [5] these forwarding functions are called primitive. In this paper we propose a...

متن کامل

The Idiom-Reference Connection

Idiom processing and reference resolution are two complex aspects of text processing that are commonly treated in isolation. However, closer study of the reference needs of some idioms suggests that these two phenomena will need to be treated together to support high-endNLP applications. Using evidence from Russian and English, this article describes a number of classes of idioms according to t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2023

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00572